Reinforcement Learning Without Rewards
نویسنده
چکیده
Machine learning can be broadly defined as the study and design of algorithms that improve with experience. Reinforcement learning is a variety of machine learning that makes minimal assumptions about the information available for learning, and, in a sense, defines the problem of learning in the broadest possible terms. Reinforcement learning algorithms are usually applied to “interactive” problems, such as learning to drive a car, operate a robotic arm, or play a game. In reinforcement learning, an autonomous agent must learn how to behave in an unknown, uncertain, and possibly hostile environment, using only the sensory feedback that it receives from the environment. As the agent moves from one state of the environment to another, it receives only a reward signal — there is no human “in the loop” to tell the algorithm exactly what to do. The goal in reinforcement learning is to learn an optimal behavior that maximizes the total reward that the agent collects. Despite its generality, the reinforcement learning framework does make one strong assumption: that the reward signal can always be directly and unambiguously observed. In other words, the feedback a reinforcement learning algorithm receives is assumed to be a part of the environment in which the agent is operating, and is included in the agent’s experience of that environment. However, in practice, rewards are usually manually-specified by the practitioner applying the learning algorithm, and specifying a reward function that elicits the desired behavior from the agent can be a subtle and frustrating design problem. Our main focus in this thesis is the design and analysis of reinforcement learning algorithms which do not require complete knowledge of the rewards. The contributions of this thesis can be divided into three main parts: • In Chapters 2 and 3, we review the theory of two-player zero-sum games, and present a novel analysis of existing no-regret algorithms for solving these games. Our results show that no-regret algorithms can be used to compute strategies iii in games that satisfy a much stronger definition of optimality than is commonly used. • In Chapters 4 and 5, we present new algorithms for apprenticeship learning, a generalization of reinforcement learning where the true rewards are unknown. The algorithms described in Chapter 5 will leverage the game-theoretic results from Chapters 2 and 3. • In Chapter 6, we show how partial knowledge of the rewards can be used to accelerate imitation learning, an alternative to reinforcement learning where the goal is to imitate another agent in the environment. In summary, we design and analyse several new algorithms for reinforcement learning that do not require access to a fully observable or fully accurate reward signal, and by doing so, add considerable flexibility to the traditional reinforcement learning framework.
منابع مشابه
Differential effects of reward and punishment in decision making under uncertainty: a computational study
Computational models of learning have proved largely successful in characterizing potential mechanisms which allow humans to make decisions in uncertain and volatile contexts. We report here findings that extend existing knowledge and show that a modified reinforcement learning model, which has separate parameters according to whether the previous trial gave a reward or a punishment, can provid...
متن کاملMaximum reward reinforcement learning: A non-cumulative reward criterion
Existing reinforcement learning paradigms proposed in the literature are guided by two performance criteria; namely: the expected cumulativereward, and the average reward criteria. Both of these criteria assume an inherently present cumulative or additivity of the rewards. However, such inherent cumulative of the rewards is not a definite necessity in some contexts. Two possible scenarios are p...
متن کاملConvergent Actor Critic by Humans
Programming robot behavior can be painstaking: for a layperson, this path is unavailable without investing significant effort in building up proficiency in coding. In contrast, nearly half of American households have a pet dog and at least some exposure to animal training, suggesting an alternative path for customizing robot behavior. Unfortunately, most existing reinforcement-learning (RL) alg...
متن کاملOn-line incremental adaptation based on reinforcement learning for robust speech recognition
We propose an incremental unsupervised adaptation method based on reinforcement learning in order to achieve robust speech recognition in various noisy environments. Reinforcement learning is a training method based on rewards that represents correctness of outputs instead of supervised data. The training progresses gradually based on rewards given. Our method is able to perform environmental a...
متن کاملForward-Backward Reinforcement Learning
Goals for reinforcement learning problems are typically defined through handspecified rewards. To design such problems, developers of learning algorithms must inherently be aware of what the task goals are, yet we often require agents to discover them on their own without any supervision beyond these sparse rewards. While much of the power of reinforcement learning derives from the concept that...
متن کاملBinary versus Real-valued Reward Functions under Coevolutionary Reinforcement Learning
Models of coevolution supporting competitive and cooperative behaviors can be used to decompose the problem while scaling to large environmental state spaces. This work examines the significance of various design decisions that impact the deployment of a distinctionbased formulation of competitive coevolution. Specifically, competitive coevolutionary formulations with and without point populati...
متن کامل